Search CORE

72 research outputs found

Visualizing Gene Clusters using Neighborhood Graphs in R

Author: Leisch Friedrich
Scharl Theresa
Publication venue
Publication date: 01/01/2008
Field of study

The visualization of cluster solutions in gene expression data analysis gives practitioners an understanding of the cluster structure of their data and makes it easier to interpret the cluster results. Neighborhood graphs allow for visual assessment of relationships between adjacent clusters. The number of clusters in gene expression data is for biological reasons rather large. As a linear projection of the data into 2 dimensions does not scale well in the number of clusters there is a need for new visualization techniques using non-linear arrangement of the clusters. The new visualization tool is implemented in the open source statistical computing environment R. It is demonstrated on microarray data from yeast

Open Access LMU

Research Online

Exploratory and inferential analysis of gene cluster neighborhood graphs

Author: Leisch Friedrich
Scharl Theresa
Voglhuber Ingo
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Many different cluster methods are frequently used in gene expression data analysis to find groups of co–expressed genes. However, cluster algorithms with the ability to visualize the resulting clusters are usually preferred. The visualization of gene clusters gives practitioners an understanding of the cluster structure of their data and makes it easier to interpret the cluster results. In this paper recent extensions of R package gcExplorer are presented. gc-Explorer is an interactive visualization toolbox for the investigation of the overall cluster structure as well as single clusters. The different visualization options including arbitrary node and panel functions are described in detail. Finally the toolbox can be used to investigate the quality of a given clustering graphically as well as theoretically by testing the association between a partition and a functional group under study. It is shown that gcExplorer is a very helpful tool for a general exploration of microarray experiments. The identification of potentially interesting gene candidates or functional groups is substantially accelerated and eased. Inferential analysis on a cluster solution is used to judge its ability to provide insight into the underlying mechanistic biology of the experiment

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Open Access LMU

Research Online

Interactive visualization of clusters in microarray data: an efficient tool for improved metabolic analysis of E. coli

Author: Bayer Karl
Leisch Friedrich
Pötschacher Florentina
Scharl Theresa
Striedner Gerald
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Interpretation of comprehensive DNA microarray data sets is a challenging task for biologists and process engineers where scientific assistance of statistics and bioinformatics is essential. Interdisciplinary cooperation and concerted development of software-tools for simplified and accelerated data analysis and interpretation is the key to overcome the bottleneck in data-analysis workflows. This approach is exemplified by <monospace>gcExplorer</monospace> an interactive visualization toolbox based on cluster analysis. Clustering is an important tool in gene expression data analysis to find groups of co-expressed genes which can finally suggest functional pathways and interactions between genes. The visualization of gene clusters gives practitioners an understanding of the cluster structure of their data and makes it easier to interpret the cluster results. Results In this study the interactive visualization toolbox <monospace>gcExplorer</monospace> is applied to the interpretation of <it>E. coli </it>microarray data. The data sets derive from two fedbatch experiments conducted in order to investigate the impact of different induction strategies on the host metabolism and product yield. The software enables direct graphical comparison of these two experiments. The identification of potentially interesting gene candidates or functional groups is substantially accelerated and eased. Conclusion It was shown that <monospace>gcExplorer</monospace> is a very helpful tool to gain a general overview of microarray experiments. Interesting gene expression patterns can easily be found, compared among different experiments and combined with information about gene function from publicly available databases.</p

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Open Access LMU

Publikationsserver der Universitätsbibliothek Bodenkultur Wien

Research Online

Mixtures of Regression Models for Time-Course Gene Expression Data: Evaluation of Initialization and Random Effects

Author: Bar-Joseph
Bettina Grün
Biernacki
Celeux
Celeux
Cho
Dempster
Diebolt
Fraley
Friedrich Leisch
Grün
Handl
Hubert
Karatzoglou
Leisch
Luan
Ma
Ng
Ng
R Development Core Team
Ramoni
Scharl
Thalamuthu
Theresa Scharl
Wehrens
Publication venue
Publication date: 01/01/2009
Field of study

Finite mixture models are routinely applied to time course microarray data. Due to the complexity and size of this type of data the choice of good starting values plays an important role. So far initialization strategies have only been investigated for data from a mixture of multivariate normal distributions. In this work several initialization procedures are evaluated for mixtures of regression models with and without random effects in an extensive simulation study on different artificial datasets. Finally these procedures are also applied to a real dataset from E. coli

Crossref

Open Access LMU

Research Online

Prediction of the performance of pre-packed purification columns through machine learning

Author: Dimartino Simone
Jiang Qihao
Jungbauer Alois
Scharl Theresa
Schroeder Tim
Seth Sohan
Publication venue: 'Wiley'
Publication date: 01/04/2022
Field of study

Pre‐packed columns have been increasingly used in process development and biomanufacturing thanks to their ease of use and consistency. Traditionally, packing quality is predicted through rate models, which require extensive calibration efforts through independent experiments to determine relevant mass transfer and kinetic rate constants. Here we propose machine learning as a complementary predictive tool for column performance. A machine learning algorithm, extreme gradient boosting, was applied to a large data set of packing quality (plate height and asymmetry) for pre‐packed columns as a function of quantitative parameters (column length, column diameter, and particle size) and qualitative attributes (backbone and functional mode). The machine learning model offered excellent predictive capabilities for the plate height and the asymmetry (90 and 93%, respectively), with packing quality strongly influenced by backbone (∼70% relative importance) and functional mode (∼15% relative importance), well above all other quantitative column parameters. The results highlight the ability of machine learning to provide reliable predictions of column performance from simple, generic parameters, including strategic qualitative parameters such as backbone and functionality, usually excluded from quantitative considerations. Our results will guide further efforts in column optimization, for example, by focusing on improvements of backbone and functional mode to obtain optimized packings

PubMed Central

Edinburgh Research Explorer

The nucleotide composition of microsatellites impacts both replication fidelity and mismatch repair in human colorectal cells

Author: Albertson
Boland
Boyer
Boyer
Brentnall
Bubb
Burns
C. Richard Boland
Christoph Campregher
Christoph Gasche
Clemens Honeder
Eckert
Ellegren
Eshleman
Freund
Gasche
Gasche
Gasche
Gragg
Harfe
Henderson
Hud
Ionov
Jackson
Koi
Kondrashov
Kunkel
Kunkel
Lea
Luria
Manuela Nemeth
Natarajan
Peinado
Pursell
Rich
Sarkar
Sharan
Sia
Sia
Streisinger
Streisinger
Subramanian
Theresa Scharl
Thomas Jascur
Twerdi
Wierdl
Yamada
Yuan
Publication venue: Oxford University Press
Publication date
Field of study

Microsatellite instability is a key mechanism of colon carcinogenesis. We have previously studied mutations within a (CA)13 microsatellite using an enhanced green fluorescent protein (EGFP)-based reporter assay that allows the distinction of replication errors and mismatch repair (MMR) activity. Here we utilize this assay to compare mutations of mono- and dinucleotide repeats in human colorectal cells. HCT116 and HCT116+chr3 cells were stably transfected with EGFP-based plasmids harboring A10, G10, G16, (CA)13 and (CA)26 repeats. EGFP-positive mutant fractions were quantitated by flow cytometry, mutation rates were calculated and the mutant spectrum was analyzed by cycle sequencing. EGFP fluorescence pattern changed with the microsatellite's nucleotide sequence and cell type and clonal variations were observed in mononucleotide repeats. Replication errors (as calculated in HCT116) at A10 repeats were 5–10-fold higher than in G10, G16 were 30-fold higher than G10 and (CA)26 were 10-fold higher than (CA)13. The mutation rates in hMLH1-proficient HCT116+chr3 were 30–230-fold lower than in HCT116. MMR was more efficient in G16 than in A10 clones leading to a higher stability of poly-G tracts. Mutation spectra revealed predominantly 1-unit deletions in A10, (CA)13 and G10 and 2-unit deletions or 1-unit insertion in (CA)26. These findings indicate that both replication fidelity and MMR are affected by the microsatellite's nucleotide composition

Crossref

PubMed Central

Using neighborhood graphs for the investigation of e. coli gene clusters

Author: Leisch Friedrich
Scharl Theresa
Publication venue: 'Sociological Research Online'
Publication date: 01/01/2008
Field of study

Clustering is commonly used in the analysis of geneexpression data to nd groups of co{expressed genes.The denition of gene clusters is not very clear asgenetic interactions are extremely complex. For thisreason the relationship between clusters is very importantas co{expressed genes can end up in dierentclusters. The neighborhood graph is a useful tool tovisualize the cluster structure. In this paper the Rpackage gcExplorer is presented which is an interactivetoolbox for the exploration of gene clusters. Additionalinformation about the gene clusters like theannotation of genes to functional groups (e.g., GOcategories) can easily be investigated. The new visualizationtoolbox is demonstrated on microarray datafrom E. coli

Research Online

R package gcExplorer: graphical and inferential exploration of cluster solutions

Author: Leisch Friedrich
Scharl Theresa
Publication venue: 'Sociological Research Online'
Publication date: 01/01/2009
Field of study

Cluster analysis is commonly applied to microarray data in order to find groups of co-expressed genes where cluster algorithms with the ability to visualize the resulting cluster objects (e.g., a dendrogram for hierarchical clustering) are usually preferred. The display of cluster solutions particularly for a large number of clusters is very important in exploratory data analysis. It gives practitioners an idea of the relationships between segments of a partition and allows to interpret the cluster results. Neighborhood graphs (Leisch, 2006) can be used for visual assessment of the cluster structure of centroid-based cluster solutions. In a neighborhood graph each node represents a cluster and two nodes are connected if there exist data points that have the two corresponding centroids as closest and second closest centroid. In this work we present new visualization methods based on the neighborhood graph. For node representation different plot symbols visualizing single clusters are used allowing a quick overview of the data. On the one hand the corresponding data points themselves can be visualized using for example line diagrams for gene expression over time. On the other hand node symbols like pie charts can be used to visualize further properties of the clusters like association to functional groups under study. Finally the neighborhood graph can be used for the validation of a cluster solution, e.g., by testing the relationship between a clustering and a priori information about gene functions. All visualization methods and test procedures used are implemented in R package gcExplorer (Scharl and Leisch, 2009) which is now available on CRAN. The grid-based node symbols are implemented in R package symbols (http://r-forge.r-project.org/projects/symbols/)

Research Online

Analysis of gene expression time-course data using cluster techniques

Author: Scharl-Hirsch Theresa
Publication venue
Publication date: 01/01/2009
Field of study

Zsfassung in dt. SpracheDiese Dissertation beschäftigt sich mit verschiedenen Aspekten der Cluster Analyse zur Auswertung von Zeitreihen Microarray Daten. Seit einigen Jahren ist die Interpretation von riesigen Datenmengen aus Microarray Experimenten eine große Herausforderung für die Statistik und Bioinformatik. Zeitreihen Microarray Experimente machen es möglich, die Genexpression von tausenden von Genen simultan zu studieren. Da Gene mit ähnlichem Expressionsmuster häufig auch koreguliert sind, kann das Clustern von Genexpressionsverläufen dabei helfen, koregulierte Gene zu finden. Letztendlich kann die Cluster Analyse dabei unterstützen, funktionale Stoffwechselwege und Interaktionen zwischen Genen zu finden.In dieser Dissertation werden sowohl partitionierende Cluster Methoden wie K-Means und der qualitätsbasierte Cluster Algorithmus QT-Clust als auch modellbasiertes Clustern untersucht. Es werden entweder die Originaldaten geclustert oder die funktionalen Daten. In der funktionalen Datenanalyse wird eine Kurve an jede Beobachtung angepasst, um die Zeitabhängigkeit zu berücksichtigen. In Simulationsstudien auf künstlichen Datensätzen werden die Eigenschaften unterschiedlicher Clustermethoden untersucht und auf ihre Nützlichkeit für Echtdaten getestet. Neue Clustermethoden für diese Art von Daten werden vorgestellt sowie einige Methoden zur Evaluierung von Clusterlösungen.Alle Cluster Algorithmen and Evaluierungsmethoden wurden in R implementiert, und alle Simulationen wurden in R durchführt.Ein wesentlicher Teil der Arbeit konzentriert sich auf die explorative Analyse von Clusterlösungen. Da genetische Interaktionen sehr komplex sind, ist die Definition von Genclustern schwierig. Beziehungen zwischen Clustern sind von großer Bedeutung, da koexprimierte Gene sehr leicht in unterschiedliche Cluster gruppiert werden können. Die Visualisierung von Clusterlösungen hilft dabei, ein besseres Verständnis für die Clusterstruktur der Daten zu bekommen und erleichtert die Interpretation der Clusterlösungen. Nachbarschaftsgraphen ermöglichen eine graphische Darstellung der Beziehungen zwischen angrenzenden Clustern.Unterschiedliche Visualisierungsmethoden zur interaktiven Untersuchung von Clusterlösungen wurden entwickelt und im R Paket gcExplorer implementiert. Die Funktionalität des Pakets beinhaltet die Visualisierung der Clusterstruktur, die Darstellung einzelner Cluster in Form von Graphiken oder HTML Tabellen, das Hervorheben bestimmter Eigenschaften von Clustern sowie einige Testprozeduren zur Beurteilung der Qualität von Clusterlösungen. Schließlich wird die Anwendung der verschiedenen Clustermethoden und die Verwendung des Pakets an mehreren Beispielen mit E. coli Daten vom Department für Biotechnologie an der Universität für Bodenkultur in Wien veranschaulicht.This thesis is concerned with different aspects of the analysis of gene expression time-course data using cluster techniques. The interpretation of enormous amounts of data from microarrays has been a challenging task in statistics and bioinformatics for the past few years. Time-course microarray experiments make it possible to look at the gene expression of thousands of genes at several time points simultaneously. Genes with similar expression pattern are likely to be co--regulated. Hence clustering gene expression patterns may help to find groups of co-regulated genes or to identify common temporal or spatial expression patterns. Finally cluster results can suggest functional pathways and interaction between genes.The cluster methods investigated in this thesis include partitioning cluster methods like the well-known K-Means or the quality-based cluster algorithm Stochastic QT-Clust as well as model-based clustering.Clustering is either carried out on the raw data or on functional data.In functional data analysis a curve is fit to each observation in order to account for time dependency. In simulation studies on artificial and real data sets from publicly available databases the properties of different cluster methods are compared and evaluated using the adjusted Rand index, the sum of within cluster distances as well as the likelihood criterion. Additionally, test procedures are developed allowing to judge the biological relevance of cluster solutions. All cluster algorithms and evaluation procedures are implemented in the statistical computing environment R and all simulations are performed in R.An essential part of this thesis deals with the visualization of cluster solutions. The definition of gene clusters is not very clear as genetic interactions are extremely complex. For this reason the relationships between clusters are very important as co-expressed genes can end up in different clusters. The visualization of cluster solutions helps to get an understanding of the cluster structure of the data and makes it easier to interpret the cluster results. Neighborhood graphs allow for visual assessment of relationships between adjacent clusters. A new visualization toolbox for the interactive exploration of cluster solutions is implemented in R package gcExplorer. The functionality of the package includes the visualization of the cluster structure in form of neighborhood graphs, the display of gene clusters in graphics or HTML tables, highlighting additional properties of the clusters as well as test procedures to judge the quality of cluster solutions. Finally, the methods are applied to E. coli data sets from the Department of Biotechnology at the University of Natural Resources and Applied Life Sciences in Vienna.17

reposiTUm